Goto

Collaborating Authors

 Almería Province


How do Machine Learning Models Change?

Castaño, Joel, Cabañas, Rafael, Salmerón, Antonio, Lo, David, Martínez-Fernández, Silverio

arXiv.org Artificial Intelligence

The proliferation of Machine Learning (ML) models and their open-source implementations has transformed Artificial Intelligence research and applications. Platforms like Hugging Face (HF) enable the development, sharing, and deployment of these models, fostering an evolving ecosystem. While previous studies have examined aspects of models hosted on platforms like HF, a comprehensive longitudinal study of how these models change remains underexplored. This study addresses this gap by utilizing both repository mining and longitudinal analysis methods to examine over 200,000 commits and 1,200 releases from over 50,000 models on HF. We replicate and extend an ML change taxonomy for classifying commits and utilize Bayesian networks to uncover patterns in commit and release activities over time. Our findings indicate that commit activities align with established data science methodologies, such as CRISP-DM, emphasizing iterative refinement and continuous improvement. Additionally, release patterns tend to consolidate significant updates, particularly in documentation, distinguishing between granular changes and milestone-based releases. Furthermore, projects with higher popularity prioritize infrastructure enhancements early in their lifecycle, and those with intensive collaboration practices exhibit improved documentation standards. These and other insights enhance the understanding of model changes on community platforms and provide valuable guidance for best practices in model maintenance.


Applying Data Driven Decision Making to rank Vocational and Educational Training Programs with TOPSIS

Conejero, J. M., Preciado, J. C., Prieto, A. E., Bas, M. C., Bolos, V. J.

arXiv.org Artificial Intelligence

The 2008 financial crisis that hit the world's economies has had a particularly acute impact in Spain (Guardiola and Guillen-Royo, 2015). It is only since 2014 that Spain seemed to begin its recovery (Martí and Pérez, 2015). However, this recuperation is still far to be acceptable with regard to the labor landscape (Casares and Vázquez, 2018). One of the main Spanish weaknesses that the crisis exposed was the so-called duality of the labor market. Thus, Spain is characterized by the existence of two very different types of workers. On one hand, long term workers on indefinite contracts, having both a very high job security and a very high cost for companies (especially in terms of dismissals) and usually with university studies even for jobs that do not require them.


NLP for The Greek Language: A Longer Survey

Papantoniou, Katerina, Tzitzikas, Yannis

arXiv.org Artificial Intelligence

There is a wide variety of methods, tools and resources for processing text in the English language. However this is not the case for the Greek language even though it has a long documented history spanning at least 3,400 years of written records (including texts in syllabic script), and 28 centuries (Archaic period - new) of written text with alphabet [1, 2]. The over 2500 years literary tradition of Greek is also notable. To aid those that are interested in using, developing or advancing the techniques for Greek processing, in this paper we survey related works and resources organized in categories. We hope this collection and categorization of works to be useful for students and researchers interested in NLP tasks, Information Retrieval and Knowledge Management for the Greek language.


A flexible framework for accurate LiDAR odometry, map manipulation, and localization

Blanco-Claraco, José Luis

arXiv.org Artificial Intelligence

LiDAR-based SLAM is a core technology for autonomous vehicles and robots. Despite the intense research activity in this field, each proposed system uses a particular sensor post-processing pipeline and a single map representation format. The present work aims at introducing a revolutionary point of view for 3D LiDAR SLAM and localization: (1) using view-based maps as the fundamental representation of maps ("simple-maps"), which can then be used to generate arbitrary metric maps optimized for particular tasks; and (2) by introducing a new framework in which mapping pipelines can be defined without coding, defining the connections of a network of reusable blocks much like deep-learning networks are designed by connecting layers of standardized elements. Moreover, the idea of including the current linear and angular velocity vectors as variables to be optimized within the ICP loop is also introduced, leading to superior robustness against aggressive motion profiles without an IMU. The presented open-source ecosystem, released to ROS 2, includes tools and prebuilt pipelines covering all the way from data acquisition to map editing and visualization, real-time localization, loop-closure detection, or map georeferencing from consumer-grade GNSS receivers. Extensive experimental validation reveals that the proposal compares well to, or improves, former state-of-the-art (SOTA) LiDAR odometry systems, while also successfully mapping some hard sequences where others diverge. A proposed self-adaptive configuration has been used, without parameter changes, for all 3D LiDAR datasets with sensors between 16 and 128 rings, extensively tested on 83 sequences over more than 250~km of automotive, hand-held, airborne, and quadruped LiDAR datasets, both indoors and outdoors. The open-sourced implementation is available online at https://github.com/MOLAorg/mola


Revisiting the Efficacy of Signal Decomposition in AI-based Time Series Prediction

Jiang, Kexin, Wu, Chuhan, Chen, Yaoran

arXiv.org Artificial Intelligence

Time series prediction is a fundamental problem in scientific exploration and artificial intelligence (AI) technologies have substantially bolstered its efficiency and accuracy. A well-established paradigm in AI-driven time series prediction is injecting physical knowledge into neural networks through signal decomposition methods, and sustaining progress in numerous scenarios has been reported. However, we uncover non-negligible evidence that challenges the effectiveness of signal decomposition in AI-based time series prediction. We confirm that improper dataset processing with subtle future label leakage is unfortunately widely adopted, possibly yielding abnormally superior but misleading results. By processing data in a strictly causal way without any future information, the effectiveness of additional decomposed signals diminishes. Our work probably identifies an ingrained and universal error in time series modeling, and the de facto progress in relevant areas is expected to be revisited and calibrated to prevent future scientific detours and minimize practical losses.


Advanced simulation-based predictive modelling for solar irradiance sensor farms

Risco-Martín, José L., Prado-Rujas, Ignacio-Iker, Campoy, Javier, Pérez, María S., Olcoz, Katzalin

arXiv.org Artificial Intelligence

As solar power continues to grow and replace traditional energy sources, the need for reliable forecasting models becomes increasingly important to ensure the stability and efficiency of the grid. However, the management of these models still needs to be improved, and new tools and technologies are required to handle the deployment and control of solar facilities. This work introduces a novel framework named Cloud-based Analysis and Integration for Data Efficiency (CAIDE), designed for real-time monitoring, management, and forecasting of solar irradiance sensor farms. CAIDE is designed to manage multiple sensor farms simultaneously while improving predictive models in real-time using well-grounded Modeling and Simulation (M&S) methodologies. The framework leverages Model Based Systems Engineering (MBSE) and an Internet of Things (IoT) infrastructure to support the deployment and analysis of solar plants in dynamic environments. The system can adapt and re-train the model when given incorrect results, ensuring that forecasts remain accurate and up-to-date. Furthermore, CAIDE can be executed in sequential, parallel, and distributed architectures, assuring scalability. The effectiveness of CAIDE is demonstrated in a complex scenario composed of several solar irradiance sensor farms connected to a centralized management system. Our results show that CAIDE is scalable and effective in managing and forecasting solar power production while improving the accuracy of predictive models in real time. The framework has important implications for the deployment of solar plants and the future of renewable energy sources.


The GREENBOT dataset: Multimodal mobile robotic dataset for a typical Mediterranean greenhouse

Cañadas-Aránega, Fernando, Blanco-Claraco, Jose Luis, Moreno, Jose Carlos, Rodriguez, Francisco

arXiv.org Artificial Intelligence

This paper introduces an innovative dataset specifically crafted for challenging agricultural settings (a greenhouse), where achieving precise localization is of paramount importance. The dataset was gathered using a mobile platform equipped with a set of sensors typically used in mobile robots, as it was moved through all the corridors of a typical Mediterranean greenhouse featuring tomato crop. This dataset presents a unique opportunity for constructing detailed 3D models of plants in such indoor-like space, with potential applications such as robotized spraying. For the first time to the best knowledge of authors, a dataset suitable to put at test Simultaneous Localization and Mapping (SLAM) methods is presented in a greenhouse environment, which poses unique challenges. The suitability of the dataset for such goal is assessed by presenting SLAM results with state-of-the-art algorithms. The dataset is available online in \url{https://arm.ual.es/arm-group/dataset-greenhouse-2024/}.


Shrub of a thousand faces: an individual segmentation from satellite images using deep learning

Khaldi, Rohaifa, Tabik, Siham, Puertas-Ruiz, Sergio, de Giles, Julio Peñas, Correa, José Antonio Hódar, Zamora, Regino, Segura, Domingo Alcaraz

arXiv.org Artificial Intelligence

Monitoring the distribution and size structure of long-living shrubs, such as Juniperus communis, can be used to estimate the long-term effects of climate change on high-mountain and high latitude ecosystems. Historical aerial very-high resolution imagery offers a retrospective tool to monitor shrub growth and distribution at high precision. Currently, deep learning models provide impressive results for detecting and delineating the contour of objects with defined shapes. However, adapting these models to detect natural objects that express complex growth patterns, such as junipers, is still a challenging task. This research presents a novel approach that leverages remotely sensed RGB imagery in conjunction with Mask R-CNN-based instance segmentation models to individually delineate Juniperus shrubs above the treeline in Sierra Nevada (Spain). In this study, we propose a new data construction design that consists in using photo interpreted (PI) and field work (FW) data to respectively develop and externally validate the model. We also propose a new shrub-tailored evaluation algorithm based on a new metric called Multiple Intersections over Ground Truth Area (MIoGTA) to assess and optimize the model shrub delineation performance. Finally, we deploy the developed model for the first time to generate a wall-to-wall map of Juniperus individuals. The experimental results demonstrate the efficiency of our dual data construction approach in overcoming the limitations associated with traditional field survey methods. They also highlight the robustness of MIoGTA metric in evaluating instance segmentation models on species with complex growth patterns showing more resilience against data annotation uncertainty. Furthermore, they show the effectiveness of employing Mask R-CNN with ResNet101-C4 backbone in delineating PI and FW shrubs, achieving an F1-score of 87,87% and 76.86%, respectively.


Uncertainty-Aware Calibration of a Hot-Wire Anemometer With Gaussian Process Regression

García-Ruiz, Rubén Antonio, Blanco-Claraco, José Luis, López-Martínez, Javier, Callejón-Ferre, Ángel Jesús

arXiv.org Artificial Intelligence

Expensive ultrasonic anemometers are usually required to measure wind speed accurately. The aim of this work is to overcome the loss of accuracy of a low cost hot-wire anemometer caused by the changes of air temperature, by means of a probabilistic calibration using Gaussian Process Regression. Gaussian Process Regression is a non-parametric, Bayesian, and supervised learning method designed to make predictions of an unknown target variable as a function of one or more known input variables. Our approach is validated against real datasets, obtaining a good performance in inferring the actual wind speed values. By performing, before its real use in the field, a calibration of the hot-wire anemometer taking into account air temperature, permits that the wind speed can be estimated for the typical range of ambient temperatures, including a grounded uncertainty estimation for each speed measure.


Efficient Computation of Counterfactual Bounds

Zaffalon, Marco, Antonucci, Alessandro, Cabañas, Rafael, Huber, David, Azzimonti, Dario

arXiv.org Artificial Intelligence

We assume to be given structural equations over discrete variables inducing a directed acyclic graph, namely, a structural causal model, together with data about its internal nodes. The question we want to answer is how we can compute bounds for partially identifiable counterfactual queries from such an input. We start by giving a map from structural casual models to credal networks. This allows us to compute exact counterfactual bounds via algorithms for credal nets on a subclass of structural causal models. Exact computation is going to be inefficient in general given that, as we show, causal inference is NP-hard even on polytrees. We target then approximate bounds via a causal EM scheme. We evaluate their accuracy by providing credible intervals on the quality of the approximation; we show through a synthetic benchmark that the EM scheme delivers accurate results in a fair number of runs. In the course of the discussion, we also point out what seems to be a neglected limitation to the trending idea that counterfactual bounds can be computed without knowledge of the structural equations. We also present a real case study on palliative care to show how our algorithms can readily be used for practical purposes.